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Abstract — We establish the convergence of the min-sum mes- 
sage passing algorithm for minimization of a quadratic objective 
function given a convex decomposition. Our results also apply 
to the equivalent problem of the convergence of Gaussian belief 
propagation. 

Index Terms — message-passing algorithms, decentralized opti- 
mization 



I. Introduction 

CONSIDER an optimization problem that is characterized 
by a set X and a hypergraph (V,C). There are \V\ 
decision variables; each is associated with a vertex i G V 
and takes values in a set X. The set C is a collection of 
subsets (or, "hyperedges") of the vertex set V; each hyperedge 
C G C is associated with a real-valued "component function" 
(or, "factor") fc ■ X c — > M. The optimization problem takes 
the form 

/(*)> 



mm 



where 



f(x) = Yl fo{x c ). 



cec 



Here, xc G X\ c \ is the vector of variables associated with 
vertices in the subset C. We refer to an optimization program 
of this form as a graphical model. While this formulation may 
seem overly broad — indeed, almost any optimization problem 
can be cast in this framework — we are implicitly assuming 
that the graph is sparse and that the hyperedges are small. 

Over the past few years, there has been significant interest 
in a heuristic optimization algorithm for graphical models. We 
will call this algorithm the min-sum message passing algo- 
rithm, or the min-sum algorithm, for short. This is equivalent 
to the so-called max-product algorithm, also known as belief 
revision, and is closely related to the sum-product algorithm, 
also known as belief propagation. Interest in such algorithms 
has to a large extent been triggered by the success of 
message passing algorithms for decoding low-density parity- 
check codes and turbo codes (TJ, fl2), 0. Message passing 
algorithms are now used routinely to solve NP-hard decoding 
problems in communication systems. It was a surprise that this 
simple and efficient approach offers sufficing solutions. 
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The majority of literature has been focused on the case 
where the set X is discrete and the resulting optimization 
problem is combinatorial in nature. We, however, are interested 
in the the case where X = K and the optimization problem is 
continuous. In particular, many continuous optimization prob- 
lems that are traditionally approached using methods of linear 
programming, convex programming, etc. also possess graph- 
ical sttucture, with objectives defined by sums of component 
functions. We believe the min-sum algorithm leverages this 
graphical structure in a way that can complement traditional 
optimization algorithms, and that combining sttengths will 
lead to algorithms that are able to scale to larger instances 
of linear and convex programs. 

One continuous case that has been considered in the litera- 
ture is that of pairwise quadratic graphical models. Here, the 
objective function is a positive definite quadratic function 

f( x ) = -x T rx - h T x, T^O. (1) 



This function is decomposed in a pairwise fashion according 
to an undirected graph (V, E), so that 



iev 



fifai) 



(i,i)es 



fij \%i T X j\ 



where the functions {/»(•), fij(-, •)} are quadratic. It has been 
shown that, if the min-sum algorithm converges, it computes 
the global minimum of the quadratic [4|, [5|, [6|. The question 
of convergence, however, has proved difficult. Sufficient condi- 
tions for convergence have been established H, Q, but these 
conditions are abstract and difficult to verify. Convergence has 
also been established for classes of quadratic programs arising 
in certain applications Q, (8). 

In recent work, Johnson, et al. J9j> iflOl have introduced the 
notion of walk-summability for pairwise quadratic graphical 
models. They establish convergence of the min-sum algorithm 
for walk-summable pairwise quadratic graphical models when 
the particular set of component functions 



fij(xi,Xj) = Ti-jXiXj, V G E, 



(2) 



is employed by the algorithm and the algorithm is initial- 
ized with zero-valued messages. Further, they give examples 
outside this class for which the min-sum algorithm does not 
converge. 

Note that there may be many ways to decompose a given 
objective function into component functions. The min-sum 
algorithm takes the specification of component functions as 
an input and exhibits different behavior for different decom- 
positions of the same objective function. Alternatively, the 
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choice of a decomposition can be seen to be equivalent to 
the choice of initial conditions for the min-sum algorithm [6], 
ifTTl . A limitation of the convergence result of Johnson, et al. 
1 9 1, 1 10] is that it requires use of a particular decomposition 
of the objective function of the form (O and with zero- 
valued initial messages. The analysis presented does not hold 
in other situation. For example, the result does not establish 
convergence of the min-sum algorithm in the applied context 
considered in 0. 

We will study the convergence of the min-sum algorithm 
given a convex decomposition: 

Definition 1. (Convex Decomposition) 

A convex decomposition of a quadratic function /(•) is a set 
of quadratic functions /«(')} sucn tnat 

iev (i,j)eE 

each function /<(•) is strictly convex, and each function fij(-, ■) 
is convex (although not necessarily strictly so). 

We will say that a quadratic objective function is convex 
decomposable if there exists a convex decomposition. This 
condition implies strict convexity of the quadratic objective 
function, however, not all strictly convex, quadratic functions 
are convex decomposable. 

The primary contribution of this paper is in establishing 
that the min-sum algorithm converges given any convex de- 
composition or even decompositions that are in some sense 
"dominated" by convex decompositions. This result can be 
equivalently restated as a sufficient condition on the initial 
messages used in the min-sum algorithm. Convergence is 
established under both synchronous and asynchronous models 
of computation. We believe that this is the most general 
convergence result available for the min-sum algorithm with 
a quadratic objective function. 

The walk-summability condition of Johnson, et al. is equiv- 
alent to the existence of a convex decomposition iflOl . In this 
way, our work can be viewed as a generalization of their con- 
vergence results to a broad class of decompositions or initial 
conditions. This generalization is of more than purely theoret- 
ical interest. The decentralized and asynchronous settings in 
which such optimization algorithms are deployed are typically 
dynamic. Consider, for example, a sensor network which seeks 
to estimate some environmental phenomena by solving an 
optimization problem of the form (Q]). As sensors are added 
or removed from the network, the objective function in ([TJ 
will change slightly. Reinitializing the optimization algorithm 
after each such change would require synchronization across 
the entire network and a large delay to allow the algorithm to 
converge. If the change in the objective function is small, it 
is likely that the change in the optimum of the optimization 
problem is small also. Hence, using the current state of the 
algorithm (the set of messages) as an initial condition may 
result in much quicker convergence. In this way, understanding 
the robustness of the min-sum algorithm over different initial 
conditions is important to assessing it's practical value. 

Beyond this, however, our work suggests path towards 
understanding the convergence of the min-sum algorithm in 



the context of general convex (i.e., not necessarily quadratic) 
objective functions. The notion of a convex decomposition is 
easily generalized, while it is not clear how to interpret the 
walk-summability condition or a decomposition of the form 
(O in the general convex case. In follow-on work fl2l . we 
have been able to establish such a generalization and develop 
conditions for the convergence of the min-sum algorithm in a 
broad range of general convex optimization problems. When 
specialized to the quadratic case, however, those results are 
not as general as the results presented herein. 

The optimization of quadratic graphical models can be 
stated as a problem of inference in Gaussian graphical mod- 
els. In this case, the min-sum algorithm is mathematically 
equivalent to sum-product algorithm (belief propagation), or 
the max-product algorithm. Our results therefore also apply to 
Gaussian belief propagation. However, since Gaussian belief 
propagation, in general, computes marginal distributions that 
have correct means but incorrect variances, we believe that the 
optimization perspective is more appropriate than the inference 
perspective. As such, we state our results in the language of 
optimization. 

Finally, note that solution of quadratic programs of the form 
(Q~|l is equivalent to the solution of the sparse, symmetric, 
positive definite linear system Fx = h. This is a well-studied 
problem with an extensive literature. The important feature of 
the min-sum algorithm in this context is that it is decentralized 
and totally asynchronous. The comparable algorithms from 
the literature fall into the class of classical iterative methods, 
such as the Jacobi method or the Gauss-Seidel method ff3l . In 
an optimization context, these methods can be interpreted as 
local search algorithms, such as gradient descent or coordinate 
descent. While these methods are quite robust, they suffer 
from a notoriously slow rate of convergence. Our hope is that 
message-passing algorithms will provide faster decentralized 
solutions to such problems than methods based on local search. 
In application contexts where a comparison can be made Q, 
preliminary results show that this may indeed be the case. 



II. The Min-Sum Algorithm 

Consider a connected undirected graph with vertices V — 
{1, . . . , n} and edges E. Let N(i) denote the set of neighbors 
of a vertex i. Consider an objective function / : M™ — > M that 
decomposes according to pairwise cliques of (V,E); that is 

f{x) =^2fi{x l ) + ^2 fij( X i> X j)- ( 3 ) 

iev (i,j)eE 

The min-sum algorithm attempts to minimize /(•) by an 
iterative, message passing procedure. In particular, at time t, 
each vertex i keeps track of a "message" from each neighbor 
u 6 N(i). This message takes the form of a function : 
K — > R. These incoming messages are combined to compute 
new outgoing messages for each neighbor. In particular, the 
message (•) from vertex i to vertex j S N(i) evolves 
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according to 



(*+i) 



« + mm ( f% {Vi ) + fij (Vi,Xj)+ E {yi ) 

u£N(i)\j 



(4) 



Here, k represents an arbitrary offset term that varies from 
message to message. Only the relative values of the function 
J^ttj (') matter, so k, does not influence relevant information. 
Its purpose is to keep messages finite. One approach is to 
select k so that J^/'(0) = 0. The functions {J^A-)} are 
initialized arbitrarily; a common choice is to set = 

for all messages. 

At time t, each vertex j forms a local objective function 
/_•(•) by combining incoming messages according to 



f f\ Xj ) = K + fj ( Xj ) + J£ij{xj)- 

ieN(j) 

The vertex then generates a running estimate of the jth 
component of an optimal solution to the original problem 
according to 



r(') 



Vi 



argmin/j*^.,, 



By dynamic programming arguments, it is easy to see that this 
procedure converges and is exact given a convex decomposi- 
tion when the graph (V, E) is a tree. We are interested in the 
case where the graph has arbitrary topology. 

A. Reparameterizations 

An alternative way to view iterates of the min-sum algo- 
rithm is as a series of "reparameterizations" of the objective 
function /(•) J6), ifTD . Each reparameterization corresponds 
to a different decomposition of the objective function. In 
particular, at each time i, we define a function ffj : R — > M, 
for each vertex j G V, and a function fjy : M 2 — > M, for each 
edge G E, so that 

/(z) = £/f(z*)+ E 
iev (i,j)eE 

The functions evolve jointly according to 

/ij + ' ( x i > x j ) = K + /ij ^ I 2 - « > x j ) 

They are initialized at time t = according to 



mm 



/f ) (x i ) = « + / i (x i )+ E J i-(^)' 
jeiv(i) 



/ij ^ ( a; i T x j) ~~ K + /ij ( x i i 21 j ) 
r(0) 



In the common case, where the functions are a H 



set to zero, the initial component functions {/^(-)> fij'i'i ')} 
are identical to {/»(•)> fij(-, •)}> modulo constant offsets. A 
running estimate of the jth component of an optimal solution 
to the original problem is generated according to 

„(*) _ A*) 



argmin/) >(yj 

Vi 



(6) 



The message passing interpretation and the reparameteriza- 
tion interpretation can be related by 



fij i x ij x j) — K + fij( x ii x j) Jj^i( x i) ^i^j( x j)^ 



(t+1) 



(Xj) = K + J>^ j {x j ) 



z 



These relations are easily established by induction on t. 
As they indicate, the message passing interpretation and the 
reparameterization interpretation are completely equivalent in 
the sense that convergence of one implies convergence of the 
other, and that they compute the same estimates of an optimal 
solution to the original optimization problem. 

Reparameterizations are more convenient for our purposes 
for the following reason: Note that the decomposition (01 
of the objective /(•) is not unique. Indeed, many alternate 
factorizations can be obtained by moving mass between the 
single vertex functions {fi(-)} and the pairwise functions 
{/y (■>•)}■ Since the message passing update (0]i depends on 
the factorization, this would seem to suggest that the each 
choice of factorization results in a different algorithm. How- 
ever, in the reparameterization interpretation, the choice of 
factorization only enters via the initial conditions. Moreover, 
it is clear that the choice of factorization is equivalent to the 
initial choice of messages {J i ^; J (-)}. Our results will identify 
sufficient conditions on these choices so that the min-sum 
algorithm converges. 

III. The Quadratic Case 

We are concerned with the case where the objective function 
/ is quadratic, i.e. 

1 



/(*) 



-x T Tx — /i T ; 



Here, T G IR™*™ is a symmetric, positive definite matrix and 
h G R™ is a vector. Since / must decompose relative to the 
graph (V, E) according to (01, we must have the non-diagonal 
entries satisfy I\y = if (i, j) £ E. Without loss of generality, 
we will assume that ^ for all € E (otherwise, 

each such edge can be deleted from the graph) and that 
Ta = 1 for alii G V (otherwise, the variables can be rescaled 
so that this is true). 

Let E C V x V be the set of directed edges. That is, 
G E iff {i,j} G E and G E iff G E. (We 

use braces and parentheses to distinguish directed and undi- 
rected edges, respectively.) Quadratic component functions 
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{/«(") j fij '(')} that sum to /(•) can be parameterized by two 
vectors of parameters, 7 = (7^-) £ R' B ' and z = (zij) £ M) E \, 
according to 

fij(xi, Xj) — — (^jjiT i jX i + 2TijXiXj + r yijT i jXj > ) 
\ ieJV(j) / V 



ieJV(i) 



Given such a representation, we will refer to the components 
of 7 as the quadratic parameters and the components of z as 
the linear parameters. 

Iterates {/j (•), fjj\', ')} of the min-sum algorithm can be 
represented by quadratic parameters 7^ and linear parameters 
z^\ By explicit computation of the minimizations involved in 
the reparameterization update Q, we can rewrite the update 
equations in terms of the parameters 7^ and z' f l In particular, 
if Euejvm\ 7 r m7™ < 1, then 



„(*+!) _ 



1 J2u£N(i)\j TjiiT 



2 „,(*) ' 



(7) 



- E z « 1 ■ (8i 

u£N(i)\j 
-2 „,(*) 



2 „,(*) 



1 SueJVCOVj 



If, on the other hand, J2ueN(i)\j KilS > 1. then the 
minimization 

rnin/f ) (2/i)+/ i y ) (j/i,af i ) 

is unbounded and the update equation is ill-posed. Further, the 
estimate of the jth component of the optimal solution, defined 
by ©, becomes 



T (*) _ 



1 



1 Siejv(j) ^ij^ii 



ieN(j) 



when X^ieJVfj) < 1> anc * ' s iU-p° s ed otherwise. 

We define a generalization to the notion of a convex 
decomposition. 

Definition 2. (Convex-Dominated Decomposition) 

A convex-dominated decomposition of a quadratic function 
/(■) is a set of quadratic functions {/j(-), ■)} that form 
a decomposition of /(•), swc/i that for some convex decompo- 
sition {gi(-),gij(; ■)}, 

9ij y^i j ) /ij (^i ; ) 

is convex, /or a/Z edges £ £7. 

Note that any convex decomposition is also convex-dominated. 
The following theorem is the main result of this paper. 

Theorem 1. (Quadratic Min-Sum Convergence) 

If /(•) is convex decomposable and {/j (Oj/ii (('>')} a 



convex- dominated decomposition, then the quadratic parame- 
ters f/;e linear parameters z^\ and the running estimates 
x^' converge. Moreover, 

lim f(x^) = min/(x). 

This result is more general than required to capture the 
"typical" situation. In particular, consider a situation where 
a problem formulation gives rise to component functions 
{/i(-), fij(-)} that form a convex decomposition of an ob- 
jective function /. Then, initialize the min-sum algorithm 



with {/f O) (0./y ; (-,')} = {/<(')>/«(•>•)}• Since the initial 
iterate is a convex decomposition, it certifies that /(•) is 
convex decomposable, and it is also a convex-dominated 
decomposition. 

We will prove Theorem[T]in Section|VT] Before doing so, we 
will study the parameter sequences 7W and independently. 

IV. Convergence of Quadratic Parameters 

The update 01 for the the quadratic parameters 7W does 
not depend on the linear parameters zw. Hence, it is natural 
to study their evolution independently, as in J5j , Q. In this 
section, we establish existence and uniqueness of a fixed point 
of the update (0. Further, we characterize initial conditions 
under which 7W converges to this fixed point. 

Whether or not a decomposition is convex depends on 
quadratic parameters but not the linear ones. Let V be the set 
of quadratic parameters 7 £ that correspond to convex 
decompositions. 

We have the following theorem establishing convergence 
for the quadratic parameters. The proof relies on certain 
monotonicity properties of the update 01, and extends the 
method developed in O, Q. 

Theorem 2. (Quadratic Parameter Convergence) 

Assume that /(•) is convex decomposable. The system of 
equations 

1 



p(0) 



1 J2uEN(i)\j r«i7ui 



V{i,i}e£, 



has a solution 7* such that 



< 7* < v, V v £ V. 

Moreover, 7* is f/ze unique such solution. 

If we initialize the min-sum algorithm so that 7^°) < u, /or 
some v £ V, f/ien < 7M < w, for all t > 0, ana? 

lim 7^ = 7*. 

t — >oc 

Proof: See Appendix lAl ■ 
The key condition for the convergence is that the initial 
quadratic parameters 7^ must be dominated by those of a 
convex decomposition. Such initial conditions are easy to find, 
for example 7^°' =0 or 7^°) £ V satisfy this requirement. 

Note that we should not expect the algorithm to converge 
for arbitrary 7W. For the update (0 to even be well-defined 
at time t, we require that 

1*7$ <1. V{i,j}e£. 

ueN(i)\j 



E 
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The condition on i y^°> in Theorem [2] guarantees this at time 
t = 0, and the theorem guarantees that it continue to hold for 
all t > 0. Similarly, the computation (0 of the estimate 
requires that 

i£N(j) 

The theorem guarantees that this is true for all t > 0, given 
suitable choice of 7' ', 

V. Convergence of Linear Parameters 

In this section, we will assume that the quadratic parameters 
7™ are set to the fixed point 7*, and study the evolution of 
the linear parameters . In this case, the update ([8]) for the 
linear parameters takes the particularly simple form 

\ ueN(i)\j j 
This linear equation can be written in vector form as 

z< t+1 ) = -Dy + Az®, 

where y g M\ E > is a vector with 

Vij = h, (10) 

D 6 ]Rl £ ' x - E l is a diagonal matrix with 

^ = -7*1^, (11) 

and A G ]Rl- Ex - E is a matrix such that 

J -7yry if (u, i), (i, j) e E, k = i, j ^ u, 
10 otherwise. 

(12) 

If the spectral radius of A is less than 1, then we have 
convergence of z^' independent of the initial condition z^ 
by 

OO 

lim z [t] = -Y^^Dy. 
t=o 

We will show that existence of a convex decomposition of 
/(•) is a sufficient condition for this to be true. In order to 
proceed, we first introduce the notion of walk-summability. 

A. Walk-Summability 

Note that the optimization problem we are considering, 

min —x T Yx — h T x, 
x 2 

has the unique solution 

x* = Y^h. 

Define R = I - Y, so R u = and R tj = -T ijt if % ± j. If 
we assume that the matrix R has spectral radius less than 1, 
we can express the solution x* by the infinite series 

OO 

x* = j2 Rtfi - ( 13 > 

t=0 



The idea of walk-sums, introduced by Johnson, et al. (9), 
allows us to interpret this solution as a sum of weights of 
walks on the graph. 

To be precise, define a walk of length k to be a sequence 
of vertices 

w = {w , . . . ,w k }, 

such that (wi,Wi+i) G E, for all < i < k. Given a walk w, 
we can define a weight by the product 

(We adopt the convention that p(w) — 1 for walks of length 
0, which consist of a single vertex.) Given a set of walks W, 
we define the weight of the set to be the sum of the weights 
of the walks in the set, that is 

P (w) = Y p(™y 
mew 

Define Wi^j to be the (infinite) set of all walks from vertex 
i to vertex j. If the quantity p(Wi->j) was well-defined, 
examining the structure of R and (fT3l >. we would have 

X J =$>(VlW*i- (14) 
Definition 3. (Walk-Summability) 

Given a matrix Y y with Ta = 1, define \R\ by \R\ij = 
\[I — T]ij\. We say T is walk-summable if the spectral radius 
of \R\ is less than 1. 

Walk-summability of Y guarantees the the function p{ ) is 
well-defined even for infinite sets of walks, since in this case, 
the series Y^tLo ^* ^ s absolutely convergent. It is not difficult 
to see that existence of a convex decomposition of /(■) implies 
walk-summability [9|. More recent work ITOI shows that these 
two conditions are in fact equivalent. 

We introduce a different weight function v(-) defined by 

V(w) = 7lu 1Ul RwoWl ' ' ' 7lt!| l „|_ 1 M;|„|-^M'|»o|-l't«|w • 

v{ ) can be extends to sets of walks as before. However, 
we interpret this function only over non-backtracking walks, 
where a walk w is non-backtracking if ii>j_i ^ Wj+i, for 
1 < i < \w\. Denote by W nb the set of non-backtracking 
walks. The following combinatorial lemma establishes a cor- 
respondence between v{-) on non-backtracking walks and p(-). 

Lemma 1. Assume that /(•) is convex decomposable. For each 
w G W nb , there exists a set of walks W w , all terminating at 
the same vertex as w, such that 

v(w) = p{W w ). 

Further, if w' G W nb and w' ^ w, then W w and W w > are 
disjoint. 

Proof: See Appendix iBl ■ 
The above lemma reveals that v(-) is well-defined on infinite 
sets of non-backtracking walks. Indeed, if W C W nb , 

]T Hw)\ = J2 <E E \p( u )i (1 5 ) 

wew wew weWueWu, 

and the latter sum is finite since Y is walk-summable. 
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We can make the correspondence between v(-) and p(-) 
stronger with the following lemma. 

Lemma 2. Assume that /(•) is convex decomposable. If we 
define W^Z, r to be the set of all non-backtracking walks from 
vertex i to vertex r, we have 



p(Wi- 



1 — V R 2 ^* ' 

Proof: See Appendix IE1 ■ 

B. Spectral Radius of A 

Examining the structure of the matrix A from (fl2l) . it is 
clear that if VV™ fc !^ is defined to be the set of all length 
t non-backtracking walks w with {u>o,wi} = {u, k) and 
{w H _ x ,i/; H } = {i,j}, then 



[A t D] ij , uk = u(yV!^+K 



Thus, if W^ki% is the set of all non-backtracking walks 
w of length at least 1 satisfying {wo,wi} = {u, k} and 
{w\ w \^x,w\ w \} = {i,j}, 

OO OO 



t=0 



tugW" 



Lemma Q] and ( fl5] > assure us that the later sum must be 
absolutely convergent. Then, we have established the following 
lemma. 

Lemma 3. Assume that /(•) is convex decomposable. The 
spectral radius of \A\ is less than 1. 

C. Exactness 

From Lemma [3j we have 

OC 

z (oo) = lim z (t) = -y*A*Dy. 

t=0 

For each vertex j, define the quantity 

f = I 

J i — v r 2 -v* ' 

In this case, the estimate Xj for each vertex j, defined by 
(O, converges to 

4° o) = - *<->) 

(oo 
h 3+ E 

= fik+ e E Kw n td+K 



Here, we define is the set of non-backtracking walks 

of length at least 1 starting at u and ending at j. Note that if 
u ^ j, then a non-backtracking walk from u to j must have 
length at least 1. Thus, 

If u = j, there is a single non-backtracking walk of length 
from j to j, namely w = {j}, and v(w) — 1. Thus, 



Hence, 



\ " 



Comparing with Lemma [2] and ( fl4l i. we have 



Thus, = x*. 

Putting together the results in this section, we have the 
following theorem. 

Theorem 3. (Linear Parameter Convergence) 

Assume that /(•) is convex decomposable and that — 7*. 
Then, for arbitrary initial conditions z^°\ the linear param- 
eters converge. Further, the corresponding estimates x^ 
converge to the global optimum x* . 

VI. Overall Convergence 

In Section [IV] we established the convergence of the 
quadratic parameters 7W. In Section [V] we established 
the convergence of the linear parameters z^' assuming the 
quadratic parameters were set to their fixed point. Here, we 
will combine these results in order to prove Theorem Q] 
which establishes convergence of the full min-sum algorithm, 
where the linear parameters evolve jointly with the quadratic 
parameters. 

It suffices to establish convergence of the linear parameters 
zW. Define the matrix A® 6 M) SxS \ by 

„(*+ 1 )i 



ij,uk 







if CM): e E, k = i, j ^ u, 
otherwise. 



(*) 



Define the diagonal matrix Z? (t) e R 1 ^*^ 1 by D 
Then, the min-sum update (HJ becomes 

where y is defined by ( fTOb . From Theorem [2] it is clear that 
A^ — > A and Z? (t) — > L> (where A and Z? are defined by 
(fl2l i and (fTTl i. respectively). 

From Lemma [3] the spectral radius of \A\ is less than 1. 
Hence, there is a vector norm || • || on W E > and a corresponding 
induced operator norm such that ||yl|| < a, for some a < 1 
04). Pick K\ sufficiently large so that \A { &\ < a for all 
t > K\. Then, the series 



E 

s=0 



(A^y 
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converges for t>K±. Set 

,« = -jrU^Y D^y = -(I - A^D^y, 



,(°°) 



^ A s Dy = -{I - Ay 1 Dy. 



Then, for t > K u 

|| z ( t+ l) _ z (oc)|| < p(t) ( ,(t) _ w ( t ))|| + || z (oo) _ w ( t )|| 

< a\\z^ -w^\\ + _„,(*) || 

< a||z (t) - z (oo) || + (1 + a)||z (oo) - 

Since u>w — > for any e > we can pick K 2 > K\ so 

that if t > K 2 , \\w^ - || < e. Then, for t > K 2 , 

_ 3(00) || < _ z (oo)|| + ( 1 + a ) e . 

Repeating over t, 

|[z<*> - || < a l - K - \z<™ - z(~) || + i±^e. 

1 — a 



Thus, 



limsup||z (t) -z (oo) || < i-^e. 
t^oo 1 - a 



Since e is arbitrary, it is clear that zW converges to z'°°). 

The fact that i« converges to follows from the same 
argument as in Theorem [3] 

A. Asynchronous Convergence 

The work we have presented thus far considers the conver- 
gence of a synchronous variation of the min-sum algorithm. In 
that case, every component of each of the parameter vectors 
7W and zW is update at every time step. However, the 
min-sum algorithm has a naturally parallel nature and can 
be applied in distributed contexts. In such implementations, 
different processors may be responsible for updating different 
components of the parameter vector. Further, these processors 
may not be able to communicate at every time step, and thus 
may have insufficient information to update the corresponding 
components of the parameter vectors. There may not even be 
a notion of a shared clock. As such, it is useful to consider 
the convergence properties of the min-sum algorithm under an 
asynchronous model of computation. 

In such a model, we assume that a processor associated 
with vertex i is responsible for updating the parameters t>9 
M 



Note that the processor at vertex i is not computing its updates 
with the most recent values of the other components of the 
parameter vector. It uses the values of components from the 
last time it communicated with a particular processor. 

We will make the assumption of total asynchronism lfl3l : 
we assume that each set T l is infinite, and that if {tk} is a 
sequence in T l tending to infinity, then limfc_ >0O Tijitk) — 00, 
for each neighbor j 6 N(i). This mild assumption guarantees 
that each component is updated infinitely often, and that pro- 
cessors eventually communicate with neighboring processors. 
It allows for arbitrary delays in communication, and even the 
out-of-order arrival of messages between processors. 

We can extend the convergence result of Theorem Q] to 
this setting. The proof is straightforward given the results we 
have already established and standard results on asynchronous 
algorithms (see ff3l . for example). We will provide an outline 
here. For the convergence of the quadratic parameters, note 
that the synchronous iteration © is a monotone mapping (see 
Lemma [4] in Appendix [A). For such monotone mappings, 
synchronous convergence implies totally asynchronous con- 
vergence by Proposition 6.2.1 in |fl3l . The linear parameter 
update equation for the synchronous algorithm has the form 

z (t+i) = _ D (t) y + A (t) z (t)_ 

For t sufficiently large, by the convergence of the quadratic 
parameters, the matrix Aw becomes arbitrarily close to A. 
From Lemma [3] the matrix |A| has spectral radius less than 
one. In this case, by Corollary 2.6.2 in [13], it must correspond 
to a weighted maximum norm contraction. Then, one can 
establish asynchronous convergence of the linear parameters 
by appealing again to Proposition 6.2.1 in lf]~3l . 

VII. Discussion 

The following corollary is a restatement of Theorem Q] in 
terms of message passing updates of the form (0). 

Corollary 1. (Convergence of Message Passing Updates) 

Let {gi(-),gij (•)•)} b e a convex decomposition £>//(•), and 
let {/»(•)>/«(■)} be a decomposition of /(•) into quadratic 
functions such that 

gij(xi.Xj) + JiZjixj) + Jj%(xi) - fij(xi,Xj) (16) 

is a convex function of (xi,Xj), for all £ E. Then, 

using the decomposition {/i(-), •)} an d quadratic initial 
messages •(•)}, the running estimates ieW generated by 



and z\ 3 > for each neighbor j E N(i). We define the T l to be the min . sum algorithm converge. Further, 



the set of times at which these parameters are updated. We 
define < Tji(t) < t to be the last time the processor at 
vertex j communicated to the processor at vertex i. Then, the 
parameters evolve according to 



Jt+1) _ 



-E, 



„(*) 



(T ui (t)) 



r, 



t(hi-Y 



;(*)) 



!-E, 

(*) 



ew(i)\j 



rL7, 



if t € T\ 
otherwise, 

if t e T\ 

otherwise, 



lim /(a;W) = min/(a;). 

The work of Johnson, et al. J5J identifies existence of convex 
decomposition of the objective as a important condition for 
such convergence results and also introduces the notion of 
walk-summability. However, the convergence analysis pre- 
sented there only establishes a special case of the above 
corollary, where 

fij(Xi : Xj*j — J^ijXiXj , V (z, j) G E, 
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In addition, they present a quadratic program that is not 
convex decomposable, and where the min-sum algorithm fails 
to converge. 

The prior work of the current authors in [7| considers a 
case that arises in distributed averaging applications. There, 
convergence is established when 

fij(xi,Xj) = -Tij(xi - Xj) 2 . Tij > 0, V (i,j) G E, 

J$%(-) is convex, V {i,j} G E, 

This is also a special case of Corollary Q] The work in Q 
further develops complexity bounds on the rate of convergence 
in certain special cases. Study of the rate of convergence of 
the min-sum algorithm in more general cases remains an open 
issue. 

Note that the main convexity condition ( TTSI l of Corollary Q] 
can also be interpreted in the context of general convex 
objectives. While our analysis is very specific to the quadratic 
case, the result may be illuminating in the broader context of 
convex programs. 

Finally, although every quadratic program can be decom- 
posed over pairwise cliques, as we assume in this paper, there 
may also be decompositions involving higher order cliques. 
Our analysis does not apply to that case, and this is an 
interesting question for future consideration. 

Appendix A 
Proof of Theorem[2] 

Define the domain 



T>= he 

and the operator F : V 



It i7ui <l, V{i,j}£E }, 

u£N(i)\j 



by 



1 



^(7) = T-^ " fT^T' V^J}£ E - 

1 l~,u£N(i)\j 1 uitui 

This operator corresponds to a single min-sum update (0 
of the quadratic parameters. We will first establish some 
properties of this operator. 

Lemma 4. The following hold: 

(i) The operator F(-) is continuous. 

(ii) The operator F(-) is monotonic. That is, if 7, 7' G T> and 
7< 7 ' ( F( 7 ) < W). 

(iii) The operator F(-) is positive. That is, if 7 G T>, F(j) > 
0. 

(iv) If v 6 V and 7 < v, 

aF( 7 ) < (a - l)v + F(v - a(v - 7)), V a > 1. 

(v) If v 6 V, F(v) < v. 

Proof: Parts (i)-(iii) follow from the corresponding prop- 
erties of the function 

1 



1-x' 



for x G (—00, 1). Part (v) follows from setting 7 = v in 
Part (iv). 



Part (iv) remains. For notational convenience, define 

ueN(i)\j 



z = v — 7 > 0. 



We have 



(a - l)vij + Fij(v ~ a(v - 7)) - aFij (7) 

1 a 
= (a — l)Vij H -. r -. r 

Rij(v — az) Rij(v — z) 

1 



Rij(v — az)Rij(v — z) 
x {(a - l)vijRij (v - az)R lj (v - z) 
+ Rij (v — z) — aRij (v — az) } . 

Denote the numerator of the last expression by A. Since the 
denominator is positive, it suffices to show that A > 0. Define 

Vj = \- ]T r?. % >o, 

ieN( 3 ) 

Sij = ^ui Z ui > 0. 

u£N(i)\j 

Note that 

Rij(v - az) = Vi + r 2 jVji + aS lj , 

Rij(v - z) = Vi + T 2 jVji + Sij. 

Since v G V, we have T 2 -VijVji > 1, for each {i,j} G E. 
Then, we can derive the chain of inequalities 



A = 



> 



> 



a - l)vij (Vi + T^Vji + aSij)(Vi + TfjVji + S tj ) 
+ Vi + T 2 jVji + S^j - a(V + T 2 jVji + aS lj ) 

a - l)vij (Vi + aSij)(Vi + YfjVji + Sij) 
+ (a- 1)(V + T 2 ^ + Sij) + V l+ T%Vji + Sij 
- a(Vi + TyVji + aSij) 

a - l)vij(Vi + aSij)(Vi + T?jVji + S^) 

- a(a — l)Sij 

a - l)vij (Vi + aS i3 )(Vi + Sij) 
+ (a- l)(Vi + aSij) - a(a - l)fiy 
a - l)vij(Vi + aSij)(Vi + S t j) + (a - l)V t 



> 0. 



We are now ready to prove Theorem [2] 

Theorem |2j Assume that /(•) is convex decomposable. The 
set of system of equations 

7a = 7— — - — rs~z~> v fort 6 ^' 

1 l^uEN(i)\j 1 ui/OT 
has a solution 7* such that 

< 7* < v, VdgV. 

Moreover, 7* is the unique such solution. 
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If we initialize the min-sum algorithm so that r yt ' < v, for 
some v G V, then < 7*-*- 1 < v, for all t > 0, ana? 

lim 7 (t) = 7*. 

t — >oo 

Proof: Pick some d£V. Then, F(v) < v from Part (v) 
of Lemma g] Thus, we have F l (v) < for all f > 0, 

by monotonicity. (Here, F'(-) denotes t applications of the 
operator F(-).) Then, the sequence {^'(u)} is a monotonically 
decreasing sequence, which by the positivity of F(-), is 
bounded below by zero. Hence, the limit exists. By 

continuity, it must be a fixed point of F(-). 

Now, note that, by positivity, < F 00 ^). Thus, by mono- 
tonicity, F*(0) < F°°(v), for all t > 0. Since < F(0) = 1, 
we have F* _1 (0) < F*(0), for all t > 0, and this sequence 
converges to a fixed point F°°(0) < F°°(v). 

We wish to show that F°°(0) = F°°(v). Assume otherwise. 
Define 

(3 = inf{a > 1 | v - a(v - F°°(v)) < F°°(0)}. 

Since F°° (v) < v, the set in the above infimum is not empty. 
Since F°°(0) < F°°(v) and F°°(0) ^ F°°(v), we must have 
(3 > 1. Then, we have 

F°°(0) >v-l3(v~F°°(v)). 

Applying F(-) and using Part (iv) of Lemma |4] 

F°°(0) > F(v - (3{v - F°°(v))) 
> /3F°°(v) - (/?-!> 
= v - (3{v - F°°(v)). 

This contradicts the definition of (3. Thus, we must have 
F°°(0) = F°°(v). 

Set 7* = F°°(0), From the above argument, we have 
< 7* = < v, for all v e V. Thus, 7* satisfies 

the conditions of the lemma. 

Assume there is some other fixed point 7' satisfying the 
conditions of the lemma. Positivity implies 7' > 0. Then, 
since < 7' < v for some v £ V, by repeatedly applying 
F(-), we have 

F^O) < 7' < F*0), 

for all t > 0. Taking a limit as i — > 00, it is clear that 7' = 7*. 

It remains to prove the final statement of the lemma. 
Consider 7^, with 7^ < v, for some v 6 V. Note that 
< ^(7) < F(v) < v. Then, 

< F*(0) < 7 (t+1) = J P t+1 (7 (0) ) < < w . 

for all t > 0. Taking limits, 



Appendix B 
Proof of Lemmas [Hand [2] 

For the balance of this section, we assume that /(•) admits 
a convex decomposition. 

In order to prove Lemma [T] we first fix an arbitrary vertex 
r, and consider an infinite computation tree rooted at a vertex 
f corresponding to r. Such a tree is constructed in an iterative 
process, first starting with a single vertex f. As each step, 
vertices are added to leaves on the tree corresponding to the 
neighbors of the leaf in the original graph other than its parent. 
Hence, the tree's vertices consist of replicas of vertices in the 
original graph, and the local structure around each vertex is 
the same as that in the original graph. We can extend both 
functions p(-) and v(-) to walks on the computation tree by 
defining weights on edges in the computation tree according to 
the weights of the corresponding edges in the original graph. 
We will use the tilde symbol to distinguish vertices and subsets 
of the computation tree from those in the underlying graph. 

We begin with a lemma. 

Lemma 5. Given connected vertices i, j in the computation 
tree, with labels respectively, let j be the set of 

walks starting at i and returning to i but never crossing the 
edge Then, 

Proof: First, note that walks in W^^j can be mapped 
to disjoint walks on the original graph. Hence, by walk- 
summability, the infinite sum 

converges absolutely. 

Now, define the set VV~ d -., - to be the set of walks in VV; ~> - 

that travel at most a distance d away from i in the computation 
tree. A walk w £ VV~ d , can be decomposed into a series 

of traversals to neighbors u G N(i) \ j, self-returning walks 
from u to u that do not cross (u, i) and travel at most distance 
d — 1 from u, and then returns to i. Letting t index the total 
number of such traversals, we have the expression 

«=o \«ejv(i)\j / 
By walk-summability, this infinite sum must converge. Thus, 



By the symmetry of the computation tree, the quantity 
p(Wf_^s j.) depends only on the labels of i and j in the original 

graph. Set 7^ = p(W^) - 1 and 7 {f = p(Wf^), for 

each € E and integer d > 0. Then, we have 
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By Theorem |2l we have 



Jim -y\f = %■ 



Then, since VV~ d C W~ + l „ and 



I J VV^ , 



i->i\j {J ' 'i^i\j> 
d=0 



we have 



p(^)=jimp(^) = 7 *. 



We call a walk on the computation tree a shortest-path walk 
if it is the unique shortest path between its endpoints. Given 
a shortest-path walk p define Wp to be the set of all walks of 
the form 

{p Q ,w°,p 1 ,w 1 , . . . ,w M ~ 1 ,p\ § \}, 

where w l S Wp i -,p i \p i+1 , for < i < \p\. Intuitively, these 
walks proceed along the path p, but at each point pi, they may 
also take a self-returning walk from vertex pi to vertex pi that 
does not cross the edge (pi,Pi+i). 

Lemma 6. Given a shortest-path walk p, 
p(Wp) = HP). 

Proof: 



P(W P -) 



E 



p|-ipvy_ _ . _ 

tKK !>0^!>|p|-l\P|p| 



E 

p({Po,w°,pi,w l , . . .,W M ~ 1 ,p\p\}) 

lpl-1 

pip) n pi™?* 

-*pi\pi+l) 



i=0 



v(p). 



We are now ready to prove Lemma [T] 

LemmaHJ Assume that /(•) is convex decomposable. For each 
w G W nb , there exists a set of walks W w , all terminating at 
the same vertex as w, such that 

v{w) = p(W w ). 

Further, if w' G W" b and w' ^ w, then W w and W w > are 
disjoint. 

Proof: Take a vertex i in the original graph. Given a 
walk from i to r in the original graph, there is a unique 
corresponding walk from a replica of i to f in the computation 
tree. Also notice that non-backtracking walks in the original 
graph that terminate at r correspond uniquely to shortest-path 
walks in the computation tree that terminate at f. 

Now, assume that w G W nb terminates at r. Let p be the 
corresponding shortest-path walk in the computation tree, and 
consider the set Wp. We will define W w to be the set of walks 
in the original graph corresponding to Wp. From Lemma [6] 

v(w) = v(p) = p{W § ) = p(W w ). 



Now, consider another walk w' G W" b , w' 7^ w, that also 
terminates at r. We would like to show that W w and W W ' are 
disjoint. Let p 1 be the shortest-path walk corresponding to w' . 
Equivalently, we can show Wp and Wpi are 

disjoint. Assume there is some walk u G Wp fl Wp>. Then, 
both p and p' must be the shortest-path from the origin of u 
to r. Since shortest-paths between a pair of vertices on the 
computation tree are unique, we must have p = p' and this 
w = w', which is a contradiction. 

Note that we only considered non-backtracking walks ter- 
minating at a fixed vertex r. However, our choice or r was 
arbitrary hence we can repeat the construction for each r G V. 
Moreover, if w and w' terminate at different vertices r and r', 
respectively, the sets W w and W w > will contain only walks that 
terminate at r and r', respectively, thus they will be disjoint. 

■ 

Using similar arguments as above, we can prove Lemma [2] 

Lemma 12 Assume that /(•) is convex decomposable. If we 
define W|^ r to be the set of all non-backtracking walks from 
vertex i to vertex r, we have 

P(W-r) = Jp— ■ 

Proof: Consider a walk w G Wi— i. r , and let w be the 
unique corresponding walk in the computation tree terminating 
at f. Let p be the unique shortest-path walk corresponding to 
w. Note that p will originate at a replica of i, and end at 
f. Thus, p uniquely corresponds to a non-backtracking walk 
w' G Wlt r . 

Now, w can be uniquely decomposed according to 



{p ,w ,p 1 ,w 1 ,...,w lpl 



"\plfil,«}, 



where w l G VVp i ^p i \p. +1 , for < i < \p\, and v is a self- 
returning walk from f to f. Applying Lemma |6l we have 

where Wf_,f is the set of self-returning walks from f to f. 

However, a walk v G VVf_>f can be uniquely decomposed 
into a series of traversals to neighbors u G N(f), self-returning 
walks from u to u that do not cross (u, f), and then returns 
to i. Letting t index the total number of such traversals, we 
have the expression 

00 / 

p(Wr^r)=J2\ E RlrP(Wu^u\r) 

From Lemma |5] 



p(Wu^u\r) = lur 



Thus, 



P (w^ r ) = K>vr-UE E R 

t=0 \ueN(r) 
1 ~ SaGiVfr) R-urlur 



2 

urlur 
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